Skip to content

[test] Add tests for SPMD vLLM#116

Closed
ZSL98 wants to merge 15 commits intoverl-project:mainfrom
ZSL98:zsl/vllm-spmd
Closed

[test] Add tests for SPMD vLLM#116
ZSL98 wants to merge 15 commits intoverl-project:mainfrom
ZSL98:zsl/vllm-spmd

Conversation

@ZSL98
Copy link
Collaborator

@ZSL98 ZSL98 commented Jan 18, 2025

This PR testing SPMD vLLM is in its early stages and not ready for immediate merging.
I am here just to confirm that the main-branch vLLM now works successfully with verl's test case (run_fsdp_vllm.py) and demonstrates compatibility. Below are some baseline comparisons for weight sync duration.

Configuration: 8*L20 GPUs, TP=4
A. Across Process(broadcast/gloo): run with python test_sync_weight_openrlhf.py with vllm_sync_backend = "gloo", rank 7 broadcast weights to rank0-3 with gloo backend.
B. Across Process(broadcast/nccl): run with python test_sync_weight_openrlhf.py with vllm_sync_backend = "nccl", rank 7 broadcast weights to rank0-3 with nccl backend.
C. FSDP+vLLM: the original test case, run 4 workers with torchrun --nproc-per-node=4 run_fsdp_vllm.py
D. FSDP+vLLM(spmd): using vllm='0.6.6.post2.dev252+g8027a724', run 4 workers with torchrun --nproc-per-node=4 run_fsdp_vllm_spmd.py

And the weight sync time (unit:second) is recorded as:

  Across Process(gloo) Across Process(nccl) FSDP+vLLM FSDP+vLLM(spmd)
Qwen2.5-3B-Instruct 4.122498 1.020927 0.236113 0.232854
Qwen2.5-7B-Instruct 13.538473 0.789263 0.546817 0.548971
Meta-Llama-3-8B-Instruct 12.211754 0.700919 0.569790 0.568038

Note that the across-process weight sync only includes the broadcast component (the complete weight sync should also include weight gathering). FSDP+vLLM and FSDP+vLLM(spmd) should perform identically since the sync weight logic remains unchanged. Based on these results, I have two conclusions:

  1. With SPMD vLLM now available, there's no need to explore cross-process weight synchronization methods.
  2. I will test replacing the original vLLM with the new vLLM in the core logic to verify verl's compatibility.

@ZSL98
Copy link
Collaborator Author

ZSL98 commented Feb 5, 2025

Moved to #209

@ZSL98 ZSL98 closed this Feb 5, 2025
dreamyang-liu pushed a commit to dreamyang-liu/verl-sagemaker that referenced this pull request Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant